Keir Fraser [Mon, 14 Dec 2009 07:58:47 +0000 (07:58 +0000)]
xend: fix a typo introduced by changeset 20621:
f9392f6eda79
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Mon, 14 Dec 2009 07:58:15 +0000 (07:58 +0000)]
Fix bug in c/s 20332 "Add commands to hotplug usb devices to hvm guests"
Signed-off-by: James Song Wei <jsong@novell.com>
Keir Fraser [Mon, 14 Dec 2009 07:57:23 +0000 (07:57 +0000)]
Disable watchdog in dump_registers
Avoids triggering watchdog if serial port output is slow.
Signed-off-by: Andrew Lyon <andrew.lyon@gmail.com>
Keir Fraser [Mon, 14 Dec 2009 07:56:21 +0000 (07:56 +0000)]
Fix losetup -f not working on SLES10
Signed-off-by: Gary Grebus <gary.grebus@oracle.com>
Keir Fraser [Mon, 14 Dec 2009 07:55:35 +0000 (07:55 +0000)]
Fix clock for XCP Windows PV drivers on restore
This fixes a timekeeping issue for 32 bit guests running XCP Windows
paravirtual drivers on a 64 bit hypervisor where their clock was set
to the 1970s after live migration or restore. Thanks to Paul Durrant
for helping track this down.
>From the original XCP patch:
Arrange that the wallclock time fields in the shared_info structure
are set correctly in 32 bit HVM guests on a 64 bit hypervisor. HVM
guests on a 64 bit hypervisor always start with a 64 bit shared info,
and then change to a 32 bit one if they're using 32 bit drivers. The
32-bit and 64-bit shared info structures put their wallclock times in
slightly different places, and so the wallclock time needs to be
regenerated when you do the conversion.
It can be argued that we should convert the other fields of shared
info at the same time (e.g. if an event channel is pending beforehand,
it should be pending afterwards), but that's much harder to arrange,
because the 32 bit structure can't represent all the states which the
64 bit one can. Just setting the time seems to be sufficient for
our purposes.
Signed-off-by: Steven Smith <steven.smith@citrix.com>
Signed-off-by: Keith Coleman <keith@scaltro.com>
Keir Fraser [Mon, 14 Dec 2009 07:54:53 +0000 (07:54 +0000)]
cpuidle: fix the menu governor to enhance IO performance
this is a revised version of linux upstream commit
69d25870f20c4b2563304f2b79c5300dd60a067e:
"
cpuidle: fix the menu governor to boost IO performance
Fix the menu idle governor which balances power savings, energy
efficiency
and performance impact.
The reason for a reworked governor is that there have been
serious
performance issues reported with the existing code on Nehalem
server
systems.
To show this I'm sure Andrew wants to see benchmark results:
(benchmark is "fio", "no cstates" is using "idle=3Dpoll")
no cstates current linux new algorithm
1 disk 107 Mb/s 85 Mb/s 105 Mb/s
2 disks 215 Mb/s 123 Mb/s 209 Mb/s
12 disks 590 Mb/s 320 Mb/s 585 Mb/s
In various power benchmark measurements, no degredation was found
by our
measurement&diagnostics team. Obviously a small percentage more
power was
used in the "fio" benchmark, due to the much higher performance.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"
in Xen version, most logic is similar and with only one exception:
linux use nr_iowait and loadavg to track the pending I/O request,
which however is not visible to Xen. so Xen use the do_irq frequency
to estimate the I/O pressure. this is not as accurate as linux, and
the better approach is to convey guest latency requirement to
hypervisor by virtual C state. this can be the future enhancement.
the detail algorithm description is in code comment. with this new
algorithm, fio benchmark performance improve ~5% with 1 disk. and no
power degration is found in idle case.
Signed-off-by: Yu Ke <ke.yu@intel.com>
Keir Fraser [Mon, 14 Dec 2009 07:52:22 +0000 (07:52 +0000)]
hvm: Fix CR0.WP=0 emulation. Don't take write emulation path for MMIO.
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Mon, 14 Dec 2009 07:46:57 +0000 (07:46 +0000)]
Add RDTSCP instruction support for HVM VMX guest.
RDTSCP is introduced in Nehalem processor on Intel platform. Like
RDTSC, RDTSCP will return the TSC value, besides, it will return the
low 32bit of TSC_AUX MSR. Currently Linux kernel will write (node_id
<< 12 | process_id) into that MSR, so that when guest execs RDTSCP, it
will also get processor information. - This instruction is supported
for HVM only when the hardware has this capability (indicated by
cpuid).
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Keir Fraser [Mon, 14 Dec 2009 07:45:04 +0000 (07:45 +0000)]
Pvrdtscp: move write_rdtscp_aux() to paravirt_ctxt_switch_to() -
Currently write_rdtscp_aux() is placed in update_vcpu_system_time(),
which is called by schedule() before context_switch(). This will break
the HVM guest TSC_AUX state because at this point, MSR hasn't beed
saved for HVM guests.So put the function in the point when a PV vcpu
is really scheduled in.
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Keir Fraser [Fri, 11 Dec 2009 09:17:09 +0000 (09:17 +0000)]
docs: Fixes for README
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 11 Dec 2009 09:07:57 +0000 (09:07 +0000)]
Update Xen version to 4.0.0-rc1-pre
Keir Fraser [Fri, 11 Dec 2009 09:01:15 +0000 (09:01 +0000)]
mini-os: Fix memory leaks in xs_read() and xs_write()
xenbus_read() and xenbus_write() will allocate memory for error
message if any error occurs, this memory should be freed.
Signed-off-by: Yu Zhiguo <yuzg@cn.fujitsu.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 11 Dec 2009 09:00:40 +0000 (09:00 +0000)]
libxenlight: Disable unneeded C++ binding for libconfig
If we want to avoid that a C++ compiler becomes a requirement for a
Xen build, we should disable the (unneeded) C++ library generation for
the embedded libconfig.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Fri, 11 Dec 2009 08:59:54 +0000 (08:59 +0000)]
tools: improve NUMA guest placement when ballooning
the "guest to a single NUMA node" constrain algorithm does not work
well when we do ballooning. Ballooning and NUMA don't play together
anyway, as Dom0 and thus ballooning is not NUMA aware, I am working on
this but it will not be ready for the Xen 4.0 release window. The
usual ballooning situation will result in an empty candidate list, as
no node has enough free memory to host the guest. In this case the
code will simply pick the first node: again and again, because all
nodes without enough memory will be ultimately penalized with the same
maxint value (regardless of the actual load). The attached patch will
change this to use a relative penalty in case of not-enough memory, so
that low-load low-memory nodes will be used at one point. A half
loaded node has shown to be a good value, as an unbalanced system is
much worse than non-local memory access for guests. Regardless of
that you should restrict the Dom0 on a NUMA system to a reasonable
memory size, so that ballooning is not necessary most of the time. In
this case the guest's memory will be NUMA local.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Fri, 11 Dec 2009 08:58:06 +0000 (08:58 +0000)]
memory hotadd 7/7: hypercall support
The basic work flow to handle the memory hotadd is:
Update node information
Map new pages to xen 1:1 mapping
Setup frametable for new memory range
Setup m2p table for new memory range
Put the new pages to domheap
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:57:30 +0000 (08:57 +0000)]
memory hotadd 6/7: Allocate L3 table for whole direct maping range if
memory hotplug is supported.
Hot-added memory may need a new L4 entry for 1:1 mapping. This patch
setup all L4 entry for 1:1 mapping if memory hotadd is needed, so that
we don't need sync the guest page table in page fault handler.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:56:50 +0000 (08:56 +0000)]
memory hotadd 5/7: Sync changes to mapping changes caused by memory
hotplug in page fault handler.
In compact guest situation, the compat m2p table is copied, not
directly mapped in L3, so we have to sync it. Direct mapping range
may changes, and we need sync it with guest's table.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:56:04 +0000 (08:56 +0000)]
memory hotadd 4/7: Setup frametable for hot-added memory
We can't use alloc_boot_pages for memory hot-add, so change it to use
the pages range passed in.
One changes need notice is, when memory hotplug needed, we have to
setup initial frametable as pdx index (i.e. the pdx_gorund_valid)
aligned, to make sure mfn_valid() still works after the max_page is
not maximum anymore.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:55:08 +0000 (08:55 +0000)]
memory hotadd 3/7: Function to share m2p tables with guest.
The m2p tables should be shared by guest as they will be read-only
mapped by guest. This logical is similar to what happens in
subarch_init_memory(). But we need check the mapping is just setup.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:54:37 +0000 (08:54 +0000)]
memory hotadd 2/7: Destroy m2p table for hot-added memory when hot-add failed.
As when we destroy the m2p table, it should not be used, so we don't
need consider clean the head/tail mapping that may exits before hot-add.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:53:57 +0000 (08:53 +0000)]
memory hotadd 1/7: Setup m2p table for hot-added memory
When new memory added to the system, we need to update the m2p table
to cover the new memory range.
When memory add, it is difficult to allocate continous pages, so we
allocate the memory from the new added memory range. This also improve
the locality in numa situation.
We don't support 1G mapping for hot memory, because AFAIK currently
hot-plug memory will not be that large.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Fri, 11 Dec 2009 08:52:17 +0000 (08:52 +0000)]
PVUSB: xm/xend support
You can see the following slides to understand the usage.
http://www.xen.org/files/xensummit_intel09/PVUSBStatusUpdate.pdf
Limitations:
"xm usb-hc-create" accepts up to 16 ports, but, current usbfront
can work with up to 15 ports. This may be bug and I'm preparing
to fix it.
This xm/xend support requires linux-2.6.18-xen.hg c/s 939 or above.
I recommend latest tip.
Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
Keir Fraser [Fri, 11 Dec 2009 08:51:21 +0000 (08:51 +0000)]
docs: Example usage of pvrdtscp algorithm
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Fri, 11 Dec 2009 08:50:13 +0000 (08:50 +0000)]
x86: Allow HPET to set timers more sloppily by seeing each CPU's
acceptable deadline range, rather than just deadline start.
Signed-off-by: Wei Gang <gang.wei@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Fri, 11 Dec 2009 08:47:51 +0000 (08:47 +0000)]
libxenlight: fix cd-insert cli arguments parsing
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 11 Dec 2009 08:46:02 +0000 (08:46 +0000)]
libxenlight: add a cli option to exit right after domain creation
This patch adds a command line option in xl to exit right after domain
creation and not wait in background for the death of the domain.
Users should be aware that if they use this option, they always have
to destroy the domain manually after the guest shuts down.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 11 Dec 2009 08:45:26 +0000 (08:45 +0000)]
libxenlight: fix two memory related issues
- LIBXL_MAXMEM_CONSTANT is 1MB but must be expressed in KB;
- xc_dom_linux_build should take target_memkb instead of max_memkb as
an argument.
Thanks to Andres for spotting the latter.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Fri, 11 Dec 2009 08:44:33 +0000 (08:44 +0000)]
domain builder: multiboot-like module support
This defines how multiple modules can be passed to a domain by packing
them together into a "multiboot module" in a way very similar to the
multiboot standard. An SIF_ flag is added to announce such package.
This also adds a packing implementation to PV-GRUB.
Signed-Off-By: Samuel Thibault <samuel.thibault@ens-lyon.org>
Keir Fraser [Fri, 11 Dec 2009 08:42:28 +0000 (08:42 +0000)]
PoD: appropriate BUG_ON when domain is dying
BUG_ON(d->is_dying) in p2m_pod_cache_add() which is introduced in
c/s 20426 is not proper. Since dom->is_dying is set asynchronously.
For example, MMU_UPDATE hypercalls from qemu and the
DOMCTL_destroydomain hypercall from xend can be issued simultaneously.
Also this patch lets p2m_pod_empty_cache() wait by spin_barrier
until another PoD operation ceases.
Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Keir Fraser [Wed, 9 Dec 2009 10:59:31 +0000 (10:59 +0000)]
x86-32/pod: fix map_domain_page() leak
The 'continue' in the if() part of the conditional at the end of
p2m_pod_zero_check() was causing this, but there also really is no
point in retaining the mapping after having checked page contents,
so fix it both ways. Additionally there is no point in updating
map[] at this point anymore.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Wed, 9 Dec 2009 10:58:52 +0000 (10:58 +0000)]
tools: simplify PYTHON_PATH computation (and fixes for NetBSD)
Doesn't work when build-time python path differs from install-time. Do
we care about this given tools should be packaged/built for the
specific run-time distro?
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Wed, 9 Dec 2009 10:46:11 +0000 (10:46 +0000)]
tmem, xentop: Report a few key per-domain tmem statistics in xentop.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Wed, 9 Dec 2009 10:44:56 +0000 (10:44 +0000)]
tmem: reclaim minimal memory proactively
When a single domain is using most/all of tmem memory
for ephemeral pages belonging to the same object, e.g.
when copying a single huge file larger than ephemeral
memory, long lists are traversed looking for a page to
evict that doesn't belong to this object (as pages in
the object for which a page is currently being inserted
are locked and cannot be evicted). This is essentially
a livelock.
Avoid this by proactively ensuring there is a margin
of available memory (1MB) before locks are taken on
the object.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Wed, 9 Dec 2009 10:44:11 +0000 (10:44 +0000)]
libxenlight: implement libxl_set_memory_target
This patch adds a target_memkb parameter to libxl_domain_build_info to
set the target memory for the VM at build time and a new function
called libxl_set_memory_target to dynamically modify the memory target
of a VM at run time. Finally a new command "mem-set" is added to xl
that calls directly libxl_set_memory_target.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 9 Dec 2009 10:43:33 +0000 (10:43 +0000)]
libxenlight: xenstore data path writable by the guest
Make the data path on xenstore writable by the guest
because Citrix pv drivers requires it.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 9 Dec 2009 10:42:53 +0000 (10:42 +0000)]
SRAT memory hotplug 2/2: Support overlapped and sparse node memory arrangement.
Currently xen hypervisor use nodes to keep start/end address of
node. It assume memory among nodes has no overlap, this is not always
true, especially if we have memory hotplug support in the system.
This patch backport Linux kernel's memblks to support overlapping
among node. The memblks will be used both for checking conflict, and
caculate memnode_shift.
Also, currently if there is no memory populated in a node when system
booting, the node will be unparsed later, and the corresponding CPU's
numa information will be removed also. This patch will keep the CPU
information.
One thing need notice is, currently we caculate memnode_shift with all
memory, including un-populated ones. This should work if the smallest
chuck is not so small. Other option can be flags in the page_info
structure, etc.
The memnodemap is changed from paddr to pdx, both to save space, and
also because currently most access is from pfn.
A flag is mem_hotplug added if there is hotplug memory range.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Wed, 9 Dec 2009 10:41:37 +0000 (10:41 +0000)]
SRAT memory hotplug 1/2: Revert 20053:
ebb07c5934c8.
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Keir Fraser [Tue, 8 Dec 2009 14:14:27 +0000 (14:14 +0000)]
hvm: Share ASID logic between VMX and SVM.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 8 Dec 2009 10:33:08 +0000 (10:33 +0000)]
hvm: Pull SVM ASID management into common HVM code where it can be shared.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 8 Dec 2009 07:55:21 +0000 (07:55 +0000)]
Track free pages live rather than count pages in all nodes/zones
Trying to fix a livelock condition in tmem that occurs
only when the system is totally out of memory requires
the ability to easily determine if all zones in all
nodes are empty, and this must be checked at a fairly
high frequency. So to avoid walking all the zones in
all the nodes each time, I'd like a fast way to determine
if "free_pages" is zero. This patch tracks the sum
of the free pages in all nodes/zones. Since I think
the value is modified only when heap_lock is held,
it need not be atomic.
I don't know this for sure, but suspect this will be
useful in other future memory utilization code, e.g.
page sharing.
This has had limited testing, though I did drive free
memory down to zero and up and down a few times with
debug on and no asserts were triggered.
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Tue, 8 Dec 2009 07:51:30 +0000 (07:51 +0000)]
VT-d: per-iommu domain-id
Currently, xen uses shared iommu domain-id across all the VT-d units
in the platform. The number of iommu domain-ids (NR_DID, e.g. 256)
supported by each VT-d unit is reported in Capability register. The
limitation of current implementation is it only can support at most
NR_DID domains with VT-d in the entire platform, even though the
platform can support N * NR_DID (where N is the number of VT-d
units). Imagine a platform with several SR_IOV NICs, and each NIC
supports 128 VFs. It possibly beyond the NR_DID.
This patch implements iommu domain-id management per iommu (VT-d
unit), hence solves above limitation. It removes the global domain-id
bitmap, instead use domain-id bitmap in struct iommu, and also involve
an array to map guest domain-id and iommu domain-id, which is used to
iommu domain-id when flush context cache or IOTLB. When a device is
assigned to a guest, choose an available iommu domain-id from the
device's iommu, and map guest domain id to the domain-id mapping
array. When a device is deassigned from a guest, clear the domain-id
bit in domain-id bitmap and clear the corresponding entry in domain-id
map array if there is no other devices under the same iommu owned by
the guest.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Tue, 8 Dec 2009 07:49:54 +0000 (07:49 +0000)]
xend: Add keymap to vfb config for existing hvm guests
I submitted a patch a while back to add keymap to vfb config for hvm
guests. This patch works fine for new config (xm create|new) but not
existing, managed guests. To cover the latter case I've introduced a
validator method in XendConfig.
Signed-off-by: Jim Fehlig <jfehlig@novell.com>
Keir Fraser [Tue, 8 Dec 2009 07:48:45 +0000 (07:48 +0000)]
Make tsc_mode=3 (pvrdtscp) work correctly.
Initial tsc_mode patch contained a rough cut at pvrdtscp mode. This
patch gets it working correctly. For the record, pvrdtscp mode allows
an application to obtain information from Xen to descale/de-offset
a physical tsc value to obtain "nsec since VM start". Though the
raw tsc value may change across migration due to different Hz rates
and different start times of different physical machines, applying
the pvrdtscp algorithm to a raw tsc value guarantees that the result
will always be both a fixed known rate (nanoseconds) and monotonically
increasing. BUT, pvrdtscp will only be fast on physical machines that
support the rdtscp instruction AND on which tsc is "safe"; on other
machines both the rdtsc and rdtscp instructions will be emulated.
Also note that when tsc_mode=3 is enabled, tsc-sensitive applications
that do NOT implement the pvrdtscp algorithm will behave incorrectly.
So, tsc_mode=3 should only be used when all apps are either
tsc-resilient
or pvrdtscp-modified, and only has a performance advantage on very
recent generation processors.
Signed-off-by: Dan Magenheiemer <dan.magenheimer@oracle.com>
Keir Fraser [Tue, 8 Dec 2009 07:47:52 +0000 (07:47 +0000)]
libxenlight: implement cdrom insert/eject
This patch implements functions in libxenlight to change the cdrom in
a VM at run time and to handle cdrom eject requests from guests.
This patch adds two new commands to xl: cd-insert and cd-eject; it
also modifies xl to handle cdrom eject requests coming from guests
(actually coming from qemu).
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 8 Dec 2009 07:45:15 +0000 (07:45 +0000)]
fs-backend: add a backend cleanup function
This patch implements a backend cleanup function in fs-backend so that
when the connection to the frontend is closed we don't leak nodes on
xenstore.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 8 Dec 2009 07:44:45 +0000 (07:44 +0000)]
libxenlight: minimal vfs support
This patch adds minimal support for fs-backend and minios' fs-front
to libxenlight:
- it creates a vfs directory on the stubdom's xenstore
device path and allows the stubdom to write to it;
- it doesn't try to cleany shutdown the vfs backend.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Mon, 7 Dec 2009 14:10:27 +0000 (14:10 +0000)]
Keir Fraser [Sat, 5 Dec 2009 12:32:34 +0000 (12:32 +0000)]
x86_32: Fix build after 20575:
0930d17589a6
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Sat, 5 Dec 2009 12:30:46 +0000 (12:30 +0000)]
libxenlight: physmap slack for pv domains
Contemplate a memory space slack for PV domains,
since they do ballooning (or flipping network rx)
and need some extra room in their pfn space.
Note that this does not allocate any extra memory
to the domain, it simply extends the physmap with
some extra room for "bounce bufffering back" pfn's
that are yielded to dom0.
The default slack is set at 8MB.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Acked-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Sat, 5 Dec 2009 12:29:48 +0000 (12:29 +0000)]
Keir Fraser [Fri, 4 Dec 2009 07:11:44 +0000 (07:11 +0000)]
libxenlight: get state for one domain
Simple function to get the dominfo state of a single domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:11:06 +0000 (07:11 +0000)]
libxenlight: domain resume
Added libxenlight implementation for resume domain.
This brings back a cooperative pv domain from the
shutdown state after save, enabling checkpointing.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:10:22 +0000 (07:10 +0000)]
libxenlight: Destroy device model only for domains that have it
Destroy device model only for domains that have it.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:09:44 +0000 (07:09 +0000)]
libxenlight: avoid writing empty values to xenstore
Prevent segmentation fault caused by empty values
in key-value pairs for the /vm/ subdirectory
when restoring a pv domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:06:47 +0000 (07:06 +0000)]
libxenlight: disk and nic destroy calls
Expose disk and nic device destroy calls
Also removes the obsolete device shutdown calls.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:03:45 +0000 (07:03 +0000)]
libxenlight: refactor libxl destroy code
Refactor libxl device destroy code. Abstract function
waiting for the watch on the state node to fire.
Create a generic device delete function.
Only a single LIBXL_DESTROY_TIMEOUT elapses when
waiting for destruction of all the devices of a
domain.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:02:49 +0000 (07:02 +0000)]
libxenlight: fix GC when cloning contexts
Provide a function to clone a context. This is necessary
because simply copying the structs will eventually
corrup the GC: maxsize is updated in the cloned context
but not in the originating, yet they have the same array
of referenced pointers alloc_ptrs.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Fri, 4 Dec 2009 07:00:25 +0000 (07:00 +0000)]
xend: Fix parameters to PyArg_ParseTupleAndKeywords()
The kwd_list parameter PyArg_ParseTupleAndKeywords() must be a
NULL-terminated list.
Signed-off-by: KUWAMURA Shin'ya <kuwa@jp.fujitsu.com>
Keir Fraser [Fri, 4 Dec 2009 06:59:33 +0000 (06:59 +0000)]
x86: XENMEM_add_to_physmap should propagate errors from guest_physmap_add_page().
Authored-by: David Lively
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Keir Fraser [Fri, 4 Dec 2009 06:58:08 +0000 (06:58 +0000)]
Add keyhandler 'g' to print all active grant table entries.
Authored-By: Robert Phillips
Signed-off-By: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Keir Fraser [Fri, 4 Dec 2009 06:51:53 +0000 (06:51 +0000)]
libxenlight: Get rid of the dependency on the LIBCONFIG_SOURCE directory.
Signed-off-by: Jean Guyader <jean.guyader@eu.citrix.com>
Keir Fraser [Fri, 4 Dec 2009 06:50:46 +0000 (06:50 +0000)]
libxenlight: Delete dep files on 'make clean', and include them in Makefile rules.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Thu, 3 Dec 2009 13:52:02 +0000 (13:52 +0000)]
grant-tables: do not fail attempts to GNTTABOP_set_version to the current version.
...even if there are active grants.
This triggers when checkpoint a guest which essentially resumes
without actually having gone through the suspend so the domain is
already latched to v2 inside Xen.
Also return the current actual version on success and failure. Not
terribly useful with only 2 options but is more robust to future
developments.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Keir Fraser [Thu, 3 Dec 2009 13:51:20 +0000 (13:51 +0000)]
xend: Add GPL license stanza to MemoryPool.py
Signed-off-by: James Song (Wei) <jsong@novell.com>
Keir Fraser [Thu, 3 Dec 2009 13:50:43 +0000 (13:50 +0000)]
Remus: fall back to xenstore if necessary
This is primarily for pvops until it gets a dedicated suspend
event channel.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Thu, 3 Dec 2009 13:50:14 +0000 (13:50 +0000)]
Remus: fix shadow memory allocation, broken by 20558:
4ed3b9b1de3f
This approach is perhaps a little cleaner than directly calling
balloon.free.
Signed-off-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Wed, 2 Dec 2009 18:46:14 +0000 (18:46 +0000)]
x86 hvm: fix up the unified HAP nested-pagefault handler.
A guest PFN may have been marked dirty and switched to p2m_ram_rw by
another CPU between the VMEXIT and lookup in this handler, so
we can't just check for p2m_ram_logdirty. Also, handle_mmio
doesn't handle passthrough MMIO.
Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Keir Fraser [Wed, 2 Dec 2009 18:43:28 +0000 (18:43 +0000)]
xentop: Allow full domain name display
Add a '-f' option to xentop to allow the full domain name to be
displayed. This is the original behavior which can cause the display
to be unaligned. Customers have requested this because only the
trailing characters of their domain names are unique and therefore
cannot be distinguished when the display is limited to a 10 character
width.
Signed-off-by: Charles Arnold <carnold@novell.com>
Keir Fraser [Wed, 2 Dec 2009 18:42:36 +0000 (18:42 +0000)]
libxenlight: fix multiple xenstore watches problem
this patch fixes the multiple xenstore watches problem in libxenlight
opening a new xenstore connection to set and read temporary watches on
the device state nodes. This way they don't interfere with other long
running watches.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 18:42:03 +0000 (18:42 +0000)]
libxenlight: use watch and select in libxl_wait_for_device_model
This patch reimplements libxl_wait_for_device_model using a xenstore
watch and a select loop.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 18:41:31 +0000 (18:41 +0000)]
libxenlight: fix dm_xenstore_record_pid
The function dm_xenstore_record_pid is executed by a child of the main
process and therefore shouldn't use the same xenstore connection:
currently it opens a new connection but still uses the old one.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 13:45:35 +0000 (13:45 +0000)]
xenstat: Fixes for 20528:
e6e3bf767d16 (stats for dom0 network bonding)
In above c/s I introduced dom0 statistics for case we use network
bonding. The indentation was not good for xenstat C codebase and also
some modifications were done to the logic, mainly not using the parsed
variables we don't care about (as we care only about
{tx|rx}{bytes,packets,errs,drops} and no other variable from
/proc/net/dev) by passing NULLs to variables we don't care about. Also
dom0 statistics alteration was fixed to include {tx|rx}{drop,errs} for
dom0 (previous version of my patch was not having this code applied).
Signed-off-by: Michal Novotny <minovotn@redhat.com>
Keir Fraser [Wed, 2 Dec 2009 13:43:37 +0000 (13:43 +0000)]
xend, vt-d: do not reserve vtd_mem if iommu is not enabled
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
Keir Fraser [Wed, 2 Dec 2009 13:39:07 +0000 (13:39 +0000)]
vmx: During task-switch, read instr-len VMCS field only when valid.
Otherwise we can crash on the BUG_ON() in __get_instruction_length().
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Wed, 2 Dec 2009 08:52:50 +0000 (08:52 +0000)]
VT-d: Fix indentation to make log messages more readable in dmar.c
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Wed, 2 Dec 2009 08:51:59 +0000 (08:51 +0000)]
pci: Correct BDF format from B:D:F to B:D.F in log messages.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Wed, 2 Dec 2009 08:51:12 +0000 (08:51 +0000)]
xend: Memory pool for pv guest on systems with >128G memory
The main idea of this patch is:
1) The admin sets aside some memory below 128G for 32-bit paravirtual
domain creation (via dom0_mem=-<value> in kernel comand line).
2) The admin also explicitly states to the tools (i..e xend) how much
memory is supposed to be left untouched by 64-bit domains
3) If a 32-bit pv DomU gets created, no ballooning ought to be
necessary (since if it is, no guarantee can be made about the address
range of the memory ballooned out), and memory gets allocated from the
reserved range.
4) Upon 64-bit (or 32-bit HVM or HVM) DomU creation, the tools
determine the amount of memory to be ballooned out of Dom0 by adding
the amount needed for the new guest and the amount still in the
reserved pool (and then of course subtracting the total amount of
memory the hypervisor has available for guest use).
Signed-off-by: james song (wei) <jsong@novell.com>
Keir Fraser [Wed, 2 Dec 2009 08:48:36 +0000 (08:48 +0000)]
VT-d: get rid of hardcode in iommu_flush_cache_entry
Currently iommu_flush_cache_entry uses a fixed size 8 bytes to flush
cache. But it also needs to flush caches with different sizes,
e.g. struct root_entry is 16 bytes. This patch fixes the hardcode by
using a parameter "size" to flush caches with different sizes.
Signed-off-by: Weidong Han <weidong.han@intel.com>
Keir Fraser [Wed, 2 Dec 2009 08:47:49 +0000 (08:47 +0000)]
xm: fix message in OptionError deprecated since Python 2.6
BaseException.message has been deprecated since Python 2.6. To
prevent DeprecationWarning from popping up over this pre-existing
attribute, use a new property that takes lookup precedence.
Signed-off-by: Wei Kong <weikong.cn@gmail.com>
Keir Fraser [Wed, 2 Dec 2009 08:46:47 +0000 (08:46 +0000)]
docs: new tsc_mode VM configuration option
Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Keir Fraser [Wed, 2 Dec 2009 08:46:11 +0000 (08:46 +0000)]
remus: Skip Linux-specific build components on other OSes
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Brendan Cully <brendan@cs.ubc.ca>
Keir Fraser [Wed, 2 Dec 2009 08:45:16 +0000 (08:45 +0000)]
libxenlight: write stubdoms logs to file
It turns out that there is a better way to write stubdoms logs to file
than using libxl_console_attach: qemu is the one that provides the
console backend for stubdoms and qemu is able to redirect a serial to
file, so we can use this feature to make sure the first stubdom
console is always redirected to a logfile.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 08:44:40 +0000 (08:44 +0000)]
libxenlight: two small fixes
- set the domid of the guest and not the one of the stubdom in the
libxl_device_model_starting returned to the user;
- check that the length of the two strings matches in
libxl_name_to_domid, otherwise we can get a match for two different
domains that have the same initial part of the name.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Wed, 2 Dec 2009 08:44:10 +0000 (08:44 +0000)]
libxl: include signal.h, required for SIGKILL definition
...makes libxl build on NetBSD.
Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Keir Fraser [Tue, 1 Dec 2009 14:19:28 +0000 (14:19 +0000)]
x86: Correctly allocate module-relocation area and bzimage headroom.
Without this patch, loading a bzimage dom0 kernel while also
requesting a dynamically-allocated crashkernel area is broken.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 1 Dec 2009 14:08:27 +0000 (14:08 +0000)]
hvmloader: Fix bug in 20510:
749b5d46e7a9 (GPE notifications)
The GPE notification decision tree was inverted.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
Keir Fraser [Tue, 1 Dec 2009 14:03:42 +0000 (14:03 +0000)]
libxenlight: wait for pv qemu initialization
this patch makes libxl_create_stubdom wait for pv qemu to be properly
initialized before unpausing the stubdom.
A new libxl_device_model_starting pointer is used to wait for pv qemu
initialization while the libxl_device_model_starting pointer given by
the user is initialized to a new structure with an empty for_spawn
member, because nothing that was spawn has to be waited for anymore.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 14:02:00 +0000 (14:02 +0000)]
x86: fix MCE/NMI injection
This attempts to address all the concerns raised in
http://lists.xensource.com/archives/html/xen-devel/2009-11/msg01195.html,
but I'm nevertheless still not convinced that all aspects of the
injection handling really work reliably. In particular, while the
patch here on top of the fixes for the problems menioned in the
referenced mail also adds code to keep send_guest_trap() from
injecting multiple events at a time, I don't think the is the right
mechanism - it should be possible to handle NMI/MCE nested within
each other.
Another fix on top of the ones for the earlier described problems is
that the vCPU affinity restore logic didn't account for software
injected NMIs - these never set cpu_affinity_tmp, but due to it most
likely being different from cpu_affinity it would have got restored
(to a potentially random value) nevertheless.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Keir Fraser [Tue, 1 Dec 2009 13:59:47 +0000 (13:59 +0000)]
xen: turn numa=on by default
I did some benchmark runs (lmbench & kernel compile) with a number of
guests running in parallel to compare the performance of numa=on vs.
numa=off. As soon as one starts to load the machine, the performance
goes down in the numa=off case. The tests were done on an 8-node
machine (4 cores each). lmbench (actually copying large amounts of
memory) shows a dramatic dropdown, but I even noticed significant
performance decrease for a tmpfs based Linux kernel compile. Here a
summary of the data:
lmbench's rd benchmark (normalized to native Linux (=100)):
guests numa=off numa=on avg increase
min avg max min avg max
1 78.0 102.3
7 37.4 45.6 62.0 90.6 102.3 110.9 124.4%
15 21.0 25.8 31.7 41.7 48.7 54.1 88.2%
23 13.4 17.5 23.2 25.0 28.0 30.1 60.2%
kernel compile in tmpfs, 1 VCPU, 2GB RAM, average of elapsed time:
guests numa=off numa=on increase
1 480.610 464.320 3.4%
7 482.109 461.721 4.2%
15 515.297 477.669 7.3%
23 548.427 495.180 9.7%
again with 2 VCPUs and make -j2:
1 264.580 261.690 1.1%
7 279.763 258.907 7.7%
15 330.385 272.762 17.4%
23 463.510 390.547 15.7% (46 VCPUs on 32pCPUs)
Selected tests on a 4-node machine showed similar behavior (7.9 %
increase with 6 parallel guests on the 2 VCPU kernel compile
benchmark).
Note that this does not affect non-NUMA machines at all, since NUMA
will be turned off again by the code if no NUMA topology is detected.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Keir Fraser [Tue, 1 Dec 2009 13:57:02 +0000 (13:57 +0000)]
libxc: pass the restore_context through function and allocate the context on the restore function stack.
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:56:26 +0000 (13:56 +0000)]
libxc: pass the suspend_context through function and allocate the context on the save function stack.
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:55:50 +0000 (13:55 +0000)]
libxc: move the domain_info_context into the restore_context
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:55:15 +0000 (13:55 +0000)]
libxc: move domain_info_context into the save_context
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:54:36 +0000 (13:54 +0000)]
libxc: move restore global variable to a global static context
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:54:01 +0000 (13:54 +0000)]
libxc: create a global context structure to record global variables in save
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:53:14 +0000 (13:53 +0000)]
libxc: create a domain_info_context structure to store guest_width and p2m_size for macros.
Macro now refers to guest_width and p2m_size through a dinfo pointer.
Signed-off-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
Keir Fraser [Tue, 1 Dec 2009 13:49:33 +0000 (13:49 +0000)]
libxenlight: enables less than maximum vcpus
Enable turning on a different amount of vcpus than
the maximum during domain creation/restore.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Tue, 1 Dec 2009 13:48:48 +0000 (13:48 +0000)]
libxenlight: allow domain to publish its suspend evtchn
Allow domain to publish its suspend event channel.
Otherwise, the fast event-channel-based suspend
path is disabled.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Tue, 1 Dec 2009 13:48:03 +0000 (13:48 +0000)]
libxenlight: write vcpu availability paths in xenstore
Write cpu availability paths to xenstore. Otherwise,
no vcpus other than the first are enabled.
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>
Keir Fraser [Tue, 1 Dec 2009 13:47:18 +0000 (13:47 +0000)]
libxenlight: remove vss and xapi patch on domain destroy
Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.com>